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Bayesian Survival Model based on Moment 
Characterization 


Julyan Arbel, Antonio Lijoi and Bernardo Nipoti 


Abstract Bayesian nonparametric marginal methods are very popular since they 
lead to fairly easy implementation due to the formal marginalization of the infinite¬ 
dimensional parameter of the model. However, the straightforwardness of these 
methods also entails some limitations: they typically yield point estimates in the 
form of posterior expectations, but cannot be used to estimate non-linear function¬ 
als of the posterior distribution, such as median, mode or credible intervals. This is 
particularly relevant in survival analysis where non-linear functionals such as e.g. 
the median survival time, play a central role for clinicians and practitioners. The 


main goal of this paper is to summarize the methodology introduced in Arbel et al 


( 2015| l for hazard mixture models in order to draw approximate Bayesian inference 
on survival functions that is not limited to the posterior mean. In addition, we pro¬ 
pose a practical implementation of an R package called momentify designed for 
moment-based density approximation, and, by means of an extensive simulation 
study, we thoroughly compare the introduced methodology with standard marginal 
methods and empirical estimation. 
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1 Introduction 


With marginal methods in Bayesian nonparametrics we refer to inferential proce¬ 
dures which rely on the integration (or marginalization) of the inhnite-dimensional 
parameter of the model. This marginalization step is typically achieved by means 
of the so-called Blackwell-MacQueen Polya urn scheme. We consider the popular 
example of the Dirichlet process ( [Fergusonl 19731 to illustrate the idea. Denote by 
Y = (Fi,... ,y„) an exchangeable sequence of random variables to which we assign 
as a prior distribution a Dirichlet process with mass parameter M and base measure 
Go, that is 


F|G~G, 


G'^DPiM,Go). 


The marginal distribution of Y, once G has been integrated out, can be derived from 
the set of predictive distributions for T,-, given (Ti,... for each i = 

In this case, such conditional distributions are linear combinations between the base 
measure Go and the empirical distribution of the conditioning variables and are ef¬ 
fectively described through a Polya urn sampling scheme. Marginal methods have 
played a major role in the success of Bayesian nonparametrics since the Polya urn 
generally leads to ready to use Markov chain Monte Carlo (MCMC) sampling strate¬ 
gies which, furthermore, immediately provide Bayesian point estimators in the form 
of posterior means. A popular example is offered by mixtures of the Dirichlet pro¬ 
cess for density estimation; for the implementation, see e.g. the R package DP- 
package by Jara et al ( 201 l| l. However, the use of marginal methods has important 
limitations that we wish to address here. Indeed, one easily notes that the posterior 
estimates provided by marginal methods are not suitably endowed with measures 
of uncertainty such as posterior credible intervals. Furthermore, using the posterior 
mean as an estimator is equivalent to choosing a square loss function which does 
not allow for other types of estimators such that median or mode of the posterior 
distribution. Finally, marginal methods do not naturally lead to the estimation of 
non-linear functionals of the distribution of a survival time, such as the median sur¬ 
vival time. For a discussion of these limitations see e.g. |Gelfand and Kottas| ( [2002) l. 

The present paper aims at proposing a new procedure that combines closed-form 
analytical results arising from the application of marginal methods with an approx¬ 
imation of the posterior distribution which makes use of posterior moments. The 
whole machinery is developed for the estimation of survival functions that are mod¬ 
eled in terms of hazard rate functions. To this end, let F denote the cumulative dis¬ 
tribution function (CDF) associated to a probability distribution on R+. If F is ab¬ 
solutely continuous, the corresponding survival function and cumulative hazard rate 
are dehned, respectively, by S{t) = 1 —F{t) and//(f) = — log(5(f)). Then, the haz¬ 
ard rate function is given by h(t) = —S'{t)/S{t). Let us recall that survival analysis 
has been a very active area of application of Bayesian nonparametric methodology: 
neutral to the right processes were used by |Doksum| ( | 1974 )1 as a prior for the CDF 
F, and beta processes by Hjort (19901 as a prior for the cumulative hazard function 
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H, both benefiting from useful conjugacy properties. Here we specify a prior on the 
hazard rate h. The most popular example is the gamma process mixture, originally 
proposed in [Dykstra and La^ ( |1981[ ). More general models have been studied in 
later work by Lo and Weng ( 1989| and James (2005 1 . Bayesian inference for these 
models often relies on a marginal method, see, e.g., Ishwaran and James (20041. 
Though quite simple to implement, marginal methods typically yield estimates of 
the hazard rate, or equivalently of the survival function, only in the form of posterior 
mean at a fixed time point. Working along the lines of Arbel et al ^015) 1, we show 
that a clever use of a moment-based approximation method does provide a relevant 
upgrade on the type of inference one can draw via marginal sampling schemes. We 
should stress that the information gathered by marginal methods is not confined to 
the posterior mean but is actually much richer and, if properly exploited, can lead to 
a more complete posterior inference. 

Let us briefly introduce Bayesian hazard mixture models. Random parameters, 
such as the hazard rate and survival function, are denoted with a tilde on top, e.g. 
Ji and S. We endow h with a prior distribution defined by the distribution of the 
random hazard rate (RHR) 


h{t) = [ k{t;y)fi{dy), (1) 

Jy 

where /I is a completely random measure (CRM) on Y = ]R+, and k( •; •) denotes a 
transition kernel on 1R+ x Y. Under suitable assumption on the CRM jl, we have 
lim,^.»/o/i(i)d s = oo with probability 1. Therefore we can adopt the following 
model 


Xi\P - P 


^((L°°)) = S{t) = exp ( - h{s)ds 


( 2 ) 


for a sequence of (possibly censored) survival data X = (Yi,...,Y„). In this set¬ 
ting, [Dykstra and Laud] ( |1981| l characterize the posterior distribution of the so- 
called extended gamma process: this is obtained when /i is a gamma CRM and 
k{t\y) = l(o,i](y)j3(y) for some positive right-continuous function p : 1R+ —> 1R+. 
The same kind of result is proved in Lo and Weng (19891 for weighted gamma pro¬ 
cesses corresponding to RHRs obtained when ft is still a gamma CRM and k( •; •) is 
an arbitrary kernel. Finally, a posterior characterization has been derived by jJamesj 
( 2005| l for any CRM p. and kernel k( •; •). 

The rest of the paper is organized as follows. In Section]^ we provide the closed- 
form expressions for the posterior moments of the survival function. We then show 
in Sectionj^how to exploit the expression for the moments to approximate the cor¬ 
responding density function and sample from it. Finally, in Section]^ we study the 
performance of our methodology by means of an extensive simulation study with 
survival data. 
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2 Moments of the posterior survival function 


Closed-form expressions for the m oments of any orde r of the posterior survival 
curve S{t) at any t are provided in Arbel et al (2015 i. For a complete account, 


we recall the result hereafter. We first need to introduce some notation. A useful 
augmentation suggests introducing latent random variables Y — {Yi,...,Y„) such 
that, building upon the posterior characterization derived by James| ( |2005[ ), we can 
derive expressions for the posterior moments of the random variable S{t), where 
f is fixed, conditionally on X and Y. To this end, define Kx{y) = /q A:(s;y)ds and 
Kxiy) = T!i=\Kxj(y)- Also, the almost sure discreteness of fx implies there might 
be ties among the y,’s with positive probability. Therefore, we denote the distinct 
values among Y by (Tj* ,..., ), where k <n, and, for any 7 = 1,..., A:, we define 

Cj = |/ : T/ = Yj I and nj = #Cj. We can now state the following result. 

Proposition 1. Denote by v(di,dy) = p(i)dicf’o(dy) the Levy intensity of the com¬ 
pletely random measure ft. Then for every t > 0 and r >0, 

E[5''(f)|X,y] =exp|- / fl-e-''^'(3'A')e-^^MW(di,dy) 

L 2R+x¥ ^ J 

exp {-5 {rK,{Y*)+Kx{Y*)) ] s’^ip{s)ds, (3) 


1 

X FT — 




where B j = s"J exp | -sKx (Y* )jp(s)ds,for J ^ 1,..., k. 

For evaluating the posterior moments E[5'^(f)|X] by means of Proposition 
we use a Gibbs sampler which proceeds by alternately sampling, at each iteration 
i= 1,..., L, from the full conditional distributions of the latent variables Y and the 
parameters of the model, and evaluate E[5''(f) \X,Y]^^'l at each step. For an exhaus¬ 
tive description of the posterior sampling and the expression of the full conditional 
distributions, see Arbel et al ( 2015| l. The remaining of the paper is devoted to illus¬ 
trating how the characterization of the moments provided by Proposition [T] can be 
used to approximate a density function and, in turn, to carry out Bayesian inference. 


3 Moment-based density approximation 


The aim is to recover the posterior distribution of the random variable S{t) for any 
fixed f, based on the knowledge of its moments E[5''(f) jif] obtained from Propo¬ 
sition [T] In order to simplify the notation, let us consider a generic continuous 
random variable S on [0,1], and denote by / its density, and its raw moments by 
7 r = E [ 5 ''], with r S IN. Recovering / from the explicit knowledge of its moments 
Jr is a classical problem in probability and statistics that has received great atten¬ 
tion in the literature. See e.g. Provost P005|l and references and motivating appli- 
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cations therein. A very general approach relies on the basis of Jacobi polynomials 
They constitute a broad class which includes, among oth¬ 
ers, Legendre and Chebyshev polynomials, and which is well suited for the expan¬ 
sion of densities with compact support (see Provost| 2005) 1. Any univariate density / 
supported on [0,1] can be uniquely decomposed on such a basis and therefore there 
is a unique sequence of real numbers (A,),>o such that f{s) = Wa ^,(s) A,G, (s) 
where Wa^b (■*) = ■*“ * (1 — * is named the weight function of the basis and is pro¬ 

portional to a beta density in the case of Jacobi polynomials. From the evaluation 
of Jq f{s) G, (i) di it follows that each A,- coincides with a linear combination of the 
hrst i moments of S, specihcally A,- = Lr=o Gijjr- Then, the polynomial approxima¬ 
tion method consists in truncating the representation of / in the Jacobi basis at a 
given level i — N. This procedure leads to a methodology that makes use only of the 
hrst N moments and provides the approximation 


N 


fff(s) = Wa,bis) Y, Y Gfs). 

i=0 \r=0 / 


(4) 


It is important to stress that the polynomial approximation 0 is not necessarily a 
density as it might fail to be positive or to integrate to 1. In order to overcome this 
problem, we consider the density 7Z proportional to the positive part of Q dehned by 
n{s) max(/jv(i),0). We resort to the rejection sampler for sampling from n. This 
is a method for drawing independently from a distribution proportional to a given 
non-negative function, that exempts us from computing the normalizing constant 
corresponding to n. More precisely, the method requires to pick a proposal distri¬ 
bution p for which there exists a positive constant M such that n < Mp. A natural 
choice for p is the beta distribution proportional to the weight function Wafi- Ap¬ 
proximation Q and the rejection sampler were implemented in R. For the purpose 
of the paper, we have wrapp ed up t he corresponding code in an R package called 


momentifyQn Sections [rilandl |3.2|we briefly describe the package implementation 


and give a simple working example. 


3.1 Package implementation 

The major function in package momentify is called moment if y and allows for 
(i) approximating a density based on its moments, and (ii) sampling from this ap¬ 
proximate distribution by using the rejection sampler. The synopsis of the function, 
together with default values, are given by 

momentify(moments, N_moments = length(moments), 
N_sim = 1000, xgrid = seq(0, 1, length = 200)) 


* The momentify package can be downloaded from the first author’s (JA) webpage 
http://www.crest.fr/pagesperso.php?user=313 0 
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The only required argument is moments, a cZ-dimensional vector, with d>2, com¬ 
posed by the values of the first d consecutive raw moments. The remaining argu¬ 
ments are optional: N_moments corresponds to N, the number of moments to be 
used (where N < d), N_sim is the size of the sample obtained by the rejection 
sampler, and xgrid denotes the grid on which the density is to be approximated. 

The function returns a list, say res, with the following components: xgrid, 
defined in argument, approx_density, the approximated density evaluated on 
xgrid, and psample, the sample obtained from approx_density by the re¬ 
ject algorithm. The class of the output list res is called momentify. For visu¬ 
alizing the output res, two method functions can be readily applied to this class, 
namely plot (res, . . . ) and hist (res, . . .). 


3.2 Simulated example 


We assess now the quality of this approximation procedure on a particular example 
by means of a practical implementation of the momentify package. We specify the 
distribution of the random variable 5 by a mixture, with weights of 1/2, of beta 
distributions of parameters {a,b) = (3,5) and {c,d) = (10,3). The raw moments of 
any order of 5 can be explicitly evaluated by 


7r = MS’-] 


^(r) 

(a -t/?)(,) 


(c-I-£/)(,) 


where = r{x + r)/r{x). As describe above, given a vector of N moments 
(7 i,... ,Yn), the introduced package allows us to approximately evaluate the den¬ 
sity Q and, in turn, to compare it with the true density. The corresponding code for 
A = 2 ,..., 10 is the following: 

rfun=function(n){bin=rbinom(n,1,.5) 

bin*rbeta (n, 3,5) -I- (1-bin) *rbeta (n, 10, 3) } 
true_density=function(n){.5*dbeta(n,3,5)+ 

.5*dbeta(n,10,3)} 

sim_data = rfun(10''5) 
moments = mean(sim_data) 
for (i in 2 :10) { 

moments = c (moments, mean (sim_data''i) ) 

res = momentify(moments = moments) 

plot (res, main = paste("N =",i)) 

curve(true_density(x),add=TRUE, col = "red") 

} 

The graphical output is given in Figure[T] We can see that four moments are needed 
in order to capture the two modes of the distribution, although coarsely. From seven 
moments onward, the fit is very good since the two curves are hardly distinguish¬ 
able. Following this example as well as other investigations not reported here, we 
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choose, as a rule of thumb, to work with = 10 moments. A more elaborated nu¬ 
merical study is presented in Arbel et al (2015|l in the context of survival analysis. 






Fig. 1 Output of momentify R package. True density f of S (in red) and approximated density /m 
(in black) involving an increasing number of moments, from N — 2 (top left) to = 10 (bottom 
right). 
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4 Bayesian inference 

4.1 Estimation of functionals of S 

Given a sample of survival times X = {Xi,...,X„}, we estimate the first N mo¬ 
ments of the posterior distribution of S{t), for f on a grid of q equally-spaced points 
{fi,... ,f^} in an interval [0,M] by using the Gibbs sampler succinctly described in 
Section]^ We then exploit the estimated moments to sample from an approximation 
of the posterior distribution of 5(f,) for according to the methodology set 

forth in Section]^ This allows us to carry out Bayesian inference, with a focus on 
the estimation of the median survival time and, for any given t in the grid, of credible 
intervals for S{t). The same approach can be easily used to estimate the posterior 
median and mode of 5(f) at any given f, and, in line of principle, any functional of 
interest. Let us first consider the median survival time that we denote by m. The iden¬ 
tity for the cumulative distribution function of m, P (m < f |X) = P(5(f) < l/2\Xf 
allows us to evaluate the CDF of m at each time point f, as q = P(5(f,) < 1/2|X'). 
Then, we can estimate the median survival time m by means of the following ap¬ 
proximation: 



(5) 


where the subscript X in indicates that the integral is with respect to the 

distribution of §{■) conditional to X. Moreover, the sequence can be used 

to devise credible intervals for the median survival time. Similarly, the posterior 
samples generated by the rejection sampler can be easily used to devise, f-by-f, 
credible intervals for 5(f) or to estimate other functionals that convey meaningful 
information such as the posterior mode and median. In Section |4.2[ we apply this 
methodology in a study involving simulated survival data where we compare the 
performance of the moment-based methodology with standard marginal methods. 


4.2 Applications 

For the purpose of illustration, we complete the model specification by assuming 
a Dykstra and Laud type of kernel k(f;y) = l(o/](y)j3, for some constant > 0, a 
gamma CRM p. and an exponential base measure Pq with rate parameter 3. More¬ 
over, for the hyperparameters c and p we choose independent gamma prior dis¬ 
tributions with shape parameter 1 and rate parameter 1/3. We then consider three 
samples X = (2fi,... ,Xn) of size n = 20,100,500 from a Weibull distribution of pa¬ 
rameters (2,2) whose survival function is 5o(f) = exp(—f^/4). We set M — 6 (the 
largest observation in the samples is 5.20) and q = 50 for the analysis of each sam¬ 
ple. We approximately evaluate, f-by-f, the posterior distribution of 5(f) together 
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with the posterior distribution of the median survival time m. By inspecting the bot¬ 
tom row of Figure]^ we can appreciate that the estimated credible intervals for S{t) 
contain the true survival function. Moreover, the posterior distribution of the median 
survival time m (blue curve) is nicely concentrated around the true value niQ. When 
relying on marginal methods, the most natural choice for quantifying the uncertainty 
of posterior estimates consists in considering the quantile intervals corresponding to 
the output of the Gibbs sampler, that we refer to as marginal intervals. This leads to 
consider, for any fixed f, the interval whose lower and upper extremes are the quan¬ 
tiles of order e.g. 0.025 and 0.975, respectively, of the sample of posterior means 
(E [5(f) IX, y] j ^ obtained, conditional on y, by the Gibbs sampler described 

in Section]^ In the middle row of Figure]^ we have compared the estimated 95% 
credible intervals for S{t) (black) and the marginal intervals corresponding to the 
output of the Gibbs sampler (dashed blue). In this example, the credible intervals 
in general contain the true survival function So{t), while this does not hold for the 
marginal intervals. This fact suggests that the marginal method tends to underesti¬ 
mate the uncertainty associated to the posterior estimates, and can be explained by 
observing that, since the underlying CRM is marginalized out, the intervals arising 
from the Gibbs sampler output capture only the variability of the posterior mean that 
can be traced back to the latent variables y and the parameters (c,)3). As a result, 
especially for a small sample size, the uncertainty detected by the marginal method 
leads to marginal intervals that can be significantly narrower than the actual pos¬ 
terior credible intervals that we approximate through the moment-based^proach. 
The Kaplan-Meier estimates of S{t) are plotted on the top row of Figure® 


Table 1 Comparison of the median survival time estimated by means of the moment-based 
method, m, by means of the marginal method, )m„, and the empirical median survival time m^, for 
different sample sizes n. For the moment-based estimation we show m, the absolute error \m— mo| 
and 95% credible interval (C/); for the marginal method we show rh„,, the absolute error |m„, —mo| 
and the 95% marginal interval (€!„,)', last two columns show the empirical estimate ihe and the 
corresponding absolute error \mg — mo|. The true median survival time is mo = 2Clog2 « 1.665. 



moment-based method 


Marginal 


Empirical 

n 

m 

|m-mo| 

Cl 

ihm 

|> 

1 

S 

o 

Cljn 

me 1 

me — mo 1 

20 

1.26 

0.40 

0.88 - 1.80 

1.24 

0.43 

1.09 - 

1.57 

0.95 

0.71 

100 

1.52 

0.14 

1.33 - 1.76 

1.51 

0.16 

1.45 - 

1.66 

1.55 

0.12 

500 

1.66 

0.01 

1.57 - 1.77 

1.66 

0.01 

1.63 - 

1.75 

1.69 

0.03 


As described in Section [4~T| the moment-based approach enables us to approxi¬ 
mate the posterior distribution of the median survival time m (in blue in the bottom 
row of Figure]^. This, in turn, can be used to derive sensible credible intervals for 
m. On the other side, when relying on marginal methods, the posterior of the me¬ 
dian survival time is not available per se. However, in the same way as we defined 
marginal intervals in place of credible intervals for the survival function S{t), for 
every f; the Gibbs sample (E[5(f,) |X,y](^))^_j ^ can be used as a proxy of a pos¬ 

terior sample for 5(f,) in order to provide the following approximation of the CDF 
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Fig. 2 The true survival function So{t) is the red line in all plots. Bottom row: estimated posterior 
mean (black solid line) with 95% credible intervals for S{t) (blaek thin lines); in blue the posterior 
distribution of the median survival time m. Middle row: comparison of the 95% credible interval 
(black line) with the marginal interval (dashed blue line). Top row: Kaplan-Meier estimate (green 
line). Sample size n = 20 (left column), n = 100 (middle column), n = 500 (right column). 
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Fig. 3 Comparison of credible intervals for the median survival time m obtained with the moment- 
based approach (black line, below for each n) and marginal intervals (blue line, above for each 
n), for varying sample size n. The dots indicate the estimators (m in black, m„, in blue and ifig in 
green). The true median mo = 2-^log2 1.665 is indicated by the vertical red dashed line. 


of m: 


P (m < t\X) « ^#{£ : E[5(f) |X’,y]W < 1 /2}. 


(6) 


As in 0, an estimator for the median survival time can be obtained as the mean of 
the distribution whose CDF is given in (|^. We call such estimator m„, to denote the 
fact that it is obtained by means of a marginal method. Similarly, from (|^, marginal 
intervals for can be derived as described in Section 4.1 Finally, we denote by ifig 
the empirical estimator of m and by mp = 2v/log2 « 1.665 the true median survival 
time. We summarize the estimates we obtained for the median survival time m in 
Figurej^and in Table[2 For all the sample sizes considered, the credible intervals for 
m contain the true value. Moreover, as expected, when n grows they shrink around 
mo: for instance the length of the interval reduces from 0.92 to 0.20 when the sample 
size n increases from 20 to 500. As observed for the marginal intervals S{t) at a given 
f, the marginal intervals for m„ obtained with the marginal method and described 
in Equation (|^ are in general narrower than the credible intervals obtained by the 
moment-based approach. Moreover, in this example, they contain the true mo only 
for n — 500. This observation suggests that the use of intervals produced by marginal 
methods as proxies for posterior credible intervals should be avoided, especially for 
small sample sizes. 
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